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Abstract — This paper proposes an algorithm of power aware 
based on voltage islands for constructing an X-clock tree with 
considering double via insertion. Different voltages are assigned 
for multiple voltage islands for power aware to reduce total 
power consumption under the clock delay control. Higher rate 
of double via insertion is made for via-effect avoidance and 
reliability. We first partition a clock network to be the number 
of voltage islands, such as L-type or T-type, and construct the 
X-clock tree for each voltage island with double via insertion. 
Then, we combine these X-clock trees based on a well-defined 
connection with inserted level shifters for minimizing the power. 
The delay effect due to the total number of inserted double vias 
is also accounted. Ten benchmarks are tested for our approach. 
Compared with single voltage island, experimental results show 
that our X-clock tree based on multi-voltage islands can save up 
to 21.58%, 4.75%, and 33.8% in power, delay, and running time, 
respectively. 

Index Terms —X-clock tree, voltage island, clock delay, power 
consumption. 

I. Introduction 

Reducing the power consumption is the critical issue in 
contemporary chip design for nano-meter process technology. 
A SoC (system-on a chip) integrates a number of functional 
blocks and usually has multi-mode operations for different set 
of blocks that work at different time [1]. If all the blocks are 
supported with a uniform supplying voltage, the power always 
consumed the same during all the working time. To save the 
power, the voltage-island design methodology [2]-[4] assigns 
multiple supplying voltages to the functional blocks in a 
system for minimizing power consumption. For instance, the 
performance-critical blocks, e.g. processors, require the 
highest supplying voltage and other noncritical-based blocks, 
e.g. control logics and peripheral units, can operate at lower 
voltages. 

Signals are often required to transmit among the functional 
blocks of voltage islands via various buses or a clock at 
different time in a chip. We need to warrantee that these 
signals can be correctly transferred among voltage islands. 
For transmitting a signal on the different voltage islands, a 
level shifter (LS) has to be inserted into the interconnection 
that transmits a signal from a low-voltage island to a high one 
because a circuit may suffer from excessive leakage energy 
when low voltage gates directly drive high voltage ones [1], 

[4] . In a contrast, a level shifter is not required when a signal 
is transferred from a high-voltage island to a low one. 

Many works for voltage islands were addressed. Lee et al. 

[5] proposed the voltage-island partitioning and floorplanning 
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under the control for timing constraints. Dong and Goto [6] 
presented the floorplanning approach based on the 
multi-voltage and level-shifter driven. A global routing for 
multi-voltage islands based on power-driven approach [7] 
was proposed for the evaluation of power reduction. Many 
literatures [8]-[13] were concentrated on single voltage island 
for clock tree construction. Tsai et al. [14 ] proposed a clock 
tree construction on multi-mode multi-voltage islands. Based 
on the binary clustering approach, they inserted buffers and 
adjusted their locations to minimize the clock delay and skew 
for matching different modes. Su et al. [15] and Lin et al. [16] 
improved the above approach by replacing some inserted 
buffers with adjustable delay buffers (ADBs) for the 
well-defined control to the clock delay under boundary skew. 
Kim [17] expanded the clock tree synthesis to a 3D stacked 
IC. With the requirement of zero skew, the clock tree 
construction depends on the trade-off of TSVs and total wire 
length. The deferred layer embedding (DLE) is used for 
reducing the number of TSVs while the deferred merge 
embedding (DME) is employed for minimizing the total wire 
length. Chen et al. [18] proposed the 3D-IC clock tree that 
constructs the clock tree on ASIC layer to associate with the 
pre-defined clock network on platform layer. The TSVs on 
ASIC layer projected from the platform layer were controlled 
for minimizing the clock delay and skew. Lee et al. [19] 
extended the voltage islanding technique to apply for the 
concurrent optimization of power and temperature in 
3D-stacked ICs. They adjusted the 2-layer 3D floorplanning 
to optimally partition several voltage islands for better hot 
stream for decreasing temperature. 

In this paper, we propose an approach to construct an 
X-clock tree on multi-voltage islands. With the pre-defined 
partition for a clock network depending on the different 
supplying voltages, the clock tree for each voltage island can 
be individually constructed and they are then combined with 
associating level shifters to get the best one in trade-off of 
power and delay under zero skew. Compared with the original 
clock network (one island) from the experiments of proposed 
approach, we proved that our power consumption always get 
larger reduction under the reasonable clock delay. 

The reminder of this paper is organized as follows. Section 
II describes the problem formulation. Section III states the 
power and delay estimation for a wiring model with level 
shifter and double via structures. Section IV presents the 
proposed algorithm of X-clock tree construction with 
considering double via insertion for power minimization. 
Experimental results running on benchmarks are shown in 
Section V followed by a conclusion of this work in Section 
VI. 

II. Problem Formulation 

For a clock network problem with multiple islands using 
different supply voltages, generally, there are two straight 
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approaches. The first one is to complete the whole clock tree 
routing, then insert level shifters in post-refinement step for 
minimizing the clock delay and compensating to be zero skew. 
Another approach is to reduce a routing problem to be a 
number of sub-problems. The approach first divides it to be 
multi-voltage islands and routes the subclock tree for each 
voltage island, then combine all the subclock trees to be a 
complete one with level-shifter insertion. 

Fig. 1 shows an example for the explanation of above two 
approaches. As shown in Fig. 1(a), a chip consists of three 
voltage islands and twelve clock sinks. Islands 1, 2, and 3 
respectively operate at 1.0 V, 1.1 V, and 1.2 V. Fig. 1(b) 
shows the first approach that the clock tree is directly 
constructed to connect twelve sinks located on three voltage 
islands. Six level shifters are required for interconnecting 
from the low-voltage island (Island 1) to other two 
high-voltage islands (Islands 2 and 3). Fig. 1(c) shows the 
second approach that the clock tree combines three subclock 
trees (CLK U CLK 2 , and CLK 3 ) with two level shifters those 
respectively connect Island 1 (1.0 V) to Island 2 (1.1 V) and 
Island 3 (1.2 V). Compared with the first approach shown in 
Fig. 1(b), the second one can save four level shifters. 
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Fig. 1. Clock routings on (a) three voltage islands constructed 
from (b) approach A with six LSs and (c) approach B with two 

LSs. 


delivered to the clock sink s t . RVI is also called double via 
insertion (DVI) because a redundant via r, is inserted next to a 
single via v, to form a double via structure, as shown in Fig. 
2(b). When using the DVI method, we should follow the via 
width (v w ) and rule space ( RS ) to avoid creating any 
design-rule violation, as shown in Fig. 2(c). The double via 
insertion is also considered in our clock tree construction for 
improving yield and reliability. 
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Fig. 2. (a) The cross-section view of a double via. (b) The 
clock signal cannot be delivered to clock sink s t if a via (v 2 ) 
defects, (c) Design rules are for single and redundant vias. 


In summary, our problem of clock routing on multi-voltage 
islands is defined as follows. 

Given a set of clock sinks on a set of multi-voltage islands, 
the objective is to construct a zero-skew X-clock tree with 
considering double via insertion for power minimization. 

Currently, no any papers contributed the clock tree 
construction based on X-architecture except our published 
PMXF approach of Tsai et al. [11] for constructing an 
X-clock tree. But, the approach just considered on the single 
voltage island with double via insertion. The purpose of this 
paper is to extend the work on multiple voltage islands to 
minimize the power consumption under the reasonable clock 
delay. 


III. Power and Delay Estimation 


We employ the second approach to construct a clock tree 
on multi-voltage islands. As shown in Fig. 1(c), the system 
clock source adopts the voltage of 1.0 V of Island 1 and two 
level shifters are required for driving Islands 2 and 3. That is, 
two subclocks CLK 2 and CLK 3 are first combined and then 
CLKi is integrated with two level shifters and them. On the 
other hand, we may select the voltage of 1.1 V of Island 2 as 
the supply voltage of system clock source, such that the 
subclocks CLKi and CLK 3 are first combined and a level 
shifter is inserted when CLK 2 is integrated with them. Based 
on the above discussion, we can estimate different 
combination of subclocks that constructs whole-chip clock 
routing with distinct inserted level shifters, power 
consumption, and clock delay. Experimentally, power 
consumption and clock delay are always trade off, that is, 
more power consumption more clock speed. 

Due to the advanced lithography technologies, metal wires 
in a chip can be routed with arbitrary angles, especially for 
diagonal (±45°) wires assigned with metal layers 3 and 4. 
X-architecture combines diagonal, horizontal, and vertical 
wires to respectively achieve improvements of 10%, 20%, 
30%, and 20% in terms of chip performance, power 
consumption, die cost, and wirelength compared with 
Manhattan-architecture [20]. Our clock tree construction 
considers this X-architecture. 

Moreover, redundant-via insertion (RVI) is a well-known 
and effective method highly recommended by semiconductor 
foundries for reducing failed vias. As shown in Fig. 2(a), if 
via-open defect (v 2 ) exists and the clock signal cannot be 


To evaluate the power consumption of a clock tree based 
on single voltage island, the calculation of total power should 
include the equivalent wire capacitances of interconnections, 
input capacitances of inserted level shifters, and loading 
capacitances of multi-voltage islands. Thus, the total power 
P tota i is formulated as follows. 


p - V 

total / j 


C F V 

load,i elk dd 


Ve : 



where Ci oadyh F dh and V dd are the capacitance of the sink i (or 
node 0, clock frequency, and supplying voltage, respectively. 
e t is defined as the set of clock tree edges those are along the 
path from the clock root to the sink i. 

The fitted Elmore delay (FED) model [21] is widely used 
for the wire delay calculation of a clock tree construction. A 
wire j with the width Wj and length lj based on the FED model 
is shown in Fig. 3(a), where r, c a , and Cf are the sheet 
resistance, unit area capacitance, and fringing capacitance, 
respectively. The delay of the wire j with a loading 
capacitance C L>i at the sink i is formulated as follows. 


Del ay (7) = (, rl, l w> )[o.5(Dc w. + Ec f )L + FC Li ], (2) 


where coefficients D , E , and F are obtained by using the curve 
fitting techniques [21]. 

A level shifter (LS) is used to insert into the interface from 
a low-voltage island to a high-voltage island. As reported in 
[1], a level shifter can consume the power and affect the delay. 
Fig. 3(b) shows that the equivalent circuit of a level shifter 
contains the intrinsic delay T LS , input capacitance c LS , and 
output resistance r LS . When a level shifter drives the wire j 
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with a loading capacitance C L>i shown in Fig. 3(a), the delay is 
formulated as follows. 


Delay(z) = T ls + (r ls + rl. / w j )[o.5(Dc a w j + Ec f )Z. + FC L . ]• ( 3 ) 
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Fig. 3. Equivalent circuits of (a) a wire j and (b) a level shifter. 


For a double via, the inserted redundant via is always 
parallel to the single via, as shown in Fig. 4(a). Hence, the 
resistance and capacitance of a double via are half and double 
of a single via, respectively. Fig. 4(b) shows the equivalent 
circuit of a double via, where k is two [20]. The delay 
calculation is referred the same to (2). 
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Fig. 4. (a) The cross-section view of a double via and (b) its 

equivalent circuit. 


iv. Algorithm of Power Aware based on 
Voltage Islands for X-clock Tree 

Construction 

Depending on multi-voltage islands, each island can 
operate at a specified supplying voltage such that the total 
power consumption of a chip can be reduced. We propose an 
algorithm of multi-voltage-island-based X-clock tree 
construction with considering double via insertion 
(MuVIX-DVI) to integrate all the voltage-island X-clock 
trees for power minimization. Fig. 5 shows the proposed 
MuVIX-DVI algorithm. 

Algorithm: MuVIX-DVI (Multi-voltage-island-based X-clock tree construction 
with double via insertion) 

Input: A set of voltage islands VI and a set of supplying voltages SV 
Output: A multi-voltage-island-based X-clock tree with considering double via 
insertion for power minimization. 

1 SV sys <- Determine the supplying voltage of system clock source. 

2 PMXF(VI); /^construct X-clock tree for each voltage island e VIM 

3 DVI-X; I* Insert double via into X-clock tree for each voltage island e VIM 

4 Let each constructed X-clock tree be a leaf-node. 

5 CS(VI) <- Obtain the connection sequences of VI based on SV sys . 

6 for each voltage island vi e CS(VI) 

7 { CS(LS ) <- Obtain the connection sequences for level-shifter insertion. 

8 do 

9 { Make combination for each Is e CS(LS ) } 

10 while ( power is improved) 

U } _ 

Fig. 5. The proposed MuVIX-DVI algorithm. 

In the algorithm, given a set of multi-voltage islands, VI, 
and a set of supplying voltages, SV, the supplying voltage 
denoted as SVsys for the system clock source is first 
determined. Then, our one algorithm PMXF [11] constructs 
the X-clock tree for each voltage island belonging to VI and 
another algorithm DVI-X [12] is applied to the tree for 
inserting double via to improve yield and reliability. Third, 
we mark these constructed X-clock trees to be leaf nodes. To 


integrate these leaf nodes of island-based X-clock trees for 
power minimization, all the connection sequences with 
different voltage islands can be combined as possible, 
denoted as CS{VI). For a connection sequence vi e CS(VI ) 
associated with the SVsys of these islands, level shifters are 
required to insert into the interface from low-to-high voltage 
islands. The combination of connection sequences with 
level-shifter insertion is denoted as CS(LS). After that, we 
estimate the power consumption for each connection 
sequence Is e CS(LS). Finally, we can get a 
multi-voltage-island-based X-clock tree with the well-defined 
connection sequence for minimum power consumption. 

A. Determination of System Clock Supplying Voltage 

Before constructing the system clock tree which connects 
all the island-based subclock trees, we first define the 
supplying voltage SV sys for the system clock source as 
follows. 

SV^=minSV t (4) 

For each voltage island vi k e VI, vi k can operate at several 
supplying voltages SV k = {vvi, sv 2 , ...}. In this work, we set the 
lowest supplying voltage of all the voltage islands as the SV sys 
for the expectation of minimum power consumption, but some 
level shifters should be required for the interfaces from 
low-to-high voltage islands. 

B. PMXF: X-Clock Tree Construction on Single Voltage 
Island 

We apply our PMXF algorithm [11] to construct an 
X-clock tree for each voltage island. Fig. 6 shows the outline 
of PMXF algorithm and the detailed information is referred in 
[11] due to limited space. An example of X-clock tree 
construction with 8 sinks, sl~s8, using PMXF algorithm is 
shown in Fig. 7. 

Algorithm: PMXF ( Pattern-Matching X-architecture clock routing with X-Flip ) 

Input: A set of n clock sinks S and an X-pattern library for a pair of points 

Output: An X-architecture clock tree with zero-skew and minimal delay 

1 for h G 0 to Tlog 2 n~l 

2 while( un-paired-node in S h ) A Check \S h \ > 1 or not. */ 

3 (y, Sj ) G DPPG{S h ); /^Determine a pair of sinks/points with GMA.*/ 

4 PMXiSj, Sj); ASelect the X-pattern of y and s y . */ 

5 P t <-DCTP(s i ,s j ,PTN(s i ,sj),x ); 

AObtain tapping point P t and zero-skew ratio x of (y, sj)*/ 

6 X-FlipiSj, Sj); A Use X-Flip technique to reduce wire length of (y, sj)M 

7 if (x<0) WireSizingis j, sj); A Size wire width WjM 

8 if (x> 1) WireSizingiSj, y); A Size wire width w t M 

9 Insert(S h+l , Pj); Alnsert P t into the set of points at the (/?+1 ) th level.*/ 

Fig. 6. The PMXF algorithm constructs an X-clock tree for 

each voltage island. 

— Metal 1 | Metal 2 / Metal 3 \ Metal 4 



Fig. 7. An example of X-clock tree with 8 sinks. 
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C. DVI-X: Double Via Insertion for X-Clock Tree 

Our DVI-X algorithm [12] is suitable for double via 
insertion to an X-clock tree. Fig. 8 shows the outline of 
DVI-X algorithm and the detailed information is referred in 


shown in Fig. 11(a), Leaf-node i and Leaf-node 2 are connected 
first and then they are associated with Leaf-node 3 to complete 
the voltage-island-based X-clock tree. Fig. 11(b) shows the 
other connection sequence of v/ 6 e CS(VI). 



Fig. 8. The DVI-X algorithm is for inserting double via into 

X-clock tree. 

□ RV 
h ■ Via 




Metal 4 

Metal 3 

mmi Metal 2 
I Metal 1 


(a) (b) 

Fig. 9. An example of X-clock partial layout, the layout (b) 
is after inserting double vias into the layout (a). 

D. Combination of X-Clock Trees among Voltage Islands 

From the above two subsections 4 .B and 4.C, given a chip 
with several voltage islands as shown in Fig. 10(a), PMXF 
first constructs the X-clock subtree for each voltage island 
and then DVI-X inserts double via into the subtree. Fig. 10(b) 
shows the X-clock subtree for Island 2. A system clock source 
enters the clock source CLK 2 of Island 2 to drive all the clock 
sinks synchronously. Here, we let its clock source CLK 2 be a 
leaf-node, denoted as Leaf-node 2 , and present its supplying 
voltages as SV 2 = {svi, sv 2 , ...}. Similarly, Leaf-node 3 and 
SV 3 respectively represent the clock source and supplying 
voltage of Island 3. 



Island 2 

Island 1 



Island 3 



(a) 

Fig. 10. (a) Given three voltage islands and (b) PMXF and 
DVI-X construct the X-clock subtree of Island 2 with double 
via insertion and (c) its subtree is labelled as a leaf node. 

To construct a multi-voltage-island-based X-clock tree, we 
should know how to connect these islands with different 
supplying voltages to achieve minimum power consumption. 
Since the construction of X-clock tree is based on binary tree 
structure, the combination of connection sequences is k\ if 
there is a number of k leaf nodes. For the three voltage islands 
shown in Fig. 10(a), they are labelled as Leaf-node\ for Island 
1, Leaf-node 2 for Island 2, and Leaf-node 3 for Island 3 with 
three supplying voltages 1.0 V, 1.1 V, and 1.2 V, respectively. 
Hence, there are six connection sequences (i.e., 3!), denoted 
as CS(VI) = {vq, vz 2 , vz 3 , vz 4 , vz 5 , vz 6 }. For vq e CS(VI) as 


with combining three leaf-nodes and inserting the required 
level shifters. For vq e CS(VI) shown in Fig. 12(a), the SV sys 
is 1.0 V due to Islands 1-3 respectively operate at 1.0 V, 1.1 V, 
and 1.2 V. The inserted level shifters LS\ and LS 2 deliver the 
system clock source at 1.0 V to Leaf-node 3 and Leaf-node 2 at 
1.2 V and 1.1 V, respectively. This is the connection sequence 
for vq with level-shifter insertion. On the other hand, Figs. 
12(b) and 12(c) show the other two connection sequences for 
vi 6 with different level-shifter insertion. In Fig. 12(c), LS\ 
delivers the system clock source at 1.0 V to Leaf-node 2 at 1.1 
V and LS 2 that converts the clock signal from 1.1 V to 1.2 V. 
Hence, each connection sequence of multi-voltage islands has 
at least one connection sequence with level-shifter insertion. 


system clock source 
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Fig. 12. The connection sequences with inserted level 
shifters for (a) vq, as well as, (b) and (c) for W 6 . 

E. Power and Delay Estimation with Inserted Level Shifters 

When the clock signal is delivered from a voltage island 
operating at SV sys to another island which supplying voltage is 
higher than SV sys , a level shifter has to be inserted. Fig. 13(a) 
shows that Islands 1 and 2 operate at 1.0 V and 1.1 V, 
respectively. When two islands are connected, a level shifter 
is inserted and delivers the clock signal from the system clock 
source to the Leaf-node 2 of Island 2. To respectively calculate 
the clock delay from the system clock source to Leaf-nodei 
and Leaf-node 2 with (2) and (3), Fig. 13(b) shows the 
equivalent model of Fig. 13(a) based on FED model. 
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Fig. 13. (a) Two connected voltage islands with level-shifter 
insertion and (b) their equivalent circuit for delay calculation. 


With the above discussion, different connection sequences 
with inserted level shifter should result different clock delays. 
At the same time, different power consumptions based on the 
calculation of (1) can also be obtained. Finally, we get the one 
with power minimization under the reasonable clock delay. 


F. Analysis of Time Complexity 

For given a set of n clock sinks in a set of m voltage islands, 
the proposed MuVIX algorithm shown in Fig. 5 can complete 
the design of multi-voltage-island-based X-clock tree. PMXF 
[11] constructs the X-clock tree for each voltage island in O (n 
log n). DVI-X [12] inserts double vias for each voltage island 
in 0(k ), where k is the number of single vias. For each 
connection sequence, it takes O (m log m) to combine m 
leaf-nodes with inserted level shifters. Because we always 
determine the lowest supplying voltage as the supplying 
voltage for the system clock source, the combination of 
connection sequences for searching the minimum power 
consumption is less than m\. Moreover, m«n,m « k , and k 
is proportional to n. Hence, the time complexity of MuVIX 
algorithm is O (n log n)+0(k ). 


V. Experimental Results 

The proposed MuVIX algorithm of power aware based on 
voltage islands for X-clock tree construction has been 
implemented by using C/C++ programming language and 
performed on a MS-Windows 8.1 machine with Intel i7 
CPU@2.2GHz, dual cores, and 8GB RAM. Table I lists the 
fabrication parameters of FED delay model [21] and level 
shifter (LS) under 130nm process [22] for power and delay 
calculation. The tested benchmarks contain IBM rl-r5 [8], 
ISCAS89 sl423, s5378, and sl5850 [9], and MCNC 
Primary 1-2 [10]. 

Table I. Technology parameters of FED delay model [21] 


and a level shifter [22] under 130nm. 


r (£2/jam) 

0.623 

D 

I.126731n2 

L LS (^) 

250 

c a (fF/|am) 

0.00598 

E 

1.104631n2 

Cls (fF) 

23.5 

Cf (fF/|am) 

0.043 

F 

I.048361n2 

Tls (ps) 

54.4 


F d k (Hz) 

100M 


For all the experiments, a benchmark is first partitioned 
into two voltage islands (i.e., L-type) or three voltage islands 
(i.e., T-typel or T-type2) and the X-clock tree is constructed 
for each partitioned voltage island using PMXF algorithm 
[11] and then DVI-X algorithm [12] is followed for 
double-via insertion with considering the skew tuning for 
skew minimization. After the X-clock tree construction with 
double via insertion for each partitioned voltage island, we 
expand the PMXF algorithm to integrate all the island-based 
sub-X-clock trees and level shifters are inserted if the clock 
signal is delivered from a low-voltage island to a high-voltage 
island. Thus, we have several different connections 
depending on a sequence of islands associated with different 


supplying voltages and level shifters. Finally, we can 
determine one of them that power consumption is minimized 
under the reasonable clock delay. Fig. 14 presents the 
platform of PMXF integrated environment for executing 
above experiments. 



Fig. 14. The platform of PMXF integrated environment. 


Fig. 15 shows three partitioned types, L-type, T-typel, and 
T-type2, for a benchmark. An L-type consists of two voltage 
islands, island 1 for 1.0 V and island 2 for 1.2 V, and the 
height of island 2 may be varied in the range of 1/2—2/3 h , 
where h is the height of a benchmark. A T-typel consists of 
three voltage islands, island 1 for 1.0 V, island 2 for 1.1 V, 
and island 3 for 1.2 V, and the height of island 2 or island 3 
may be changed in the range of 1/3—2/3 h. A T-type2 consists 
of three voltage islands, island 1 for 1.0V, island 2 for 1.1 V, 
and island 3 for 1.2 V, and the height of island 2 or island 3 
may be varied in the range of 1/4—2/3 h. 
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Fig. 15. The partition of (a) L-type, (b) T-typel voltage 
islands, and (c) T-type2 voltage islands. 


Table II shows the results of single voltage-island-based 
X-clock trees with/without considering double via insertion in 
terms of via, power consumption, clock delay, and CPU time. 
From the experiments, the rate (#Dvia / #via) of double via 
insertion (DVI) is up to 91.81% on average. This higher DVI 
rate can improve yield rate and reliability during process 
manufacture. The power and delay with double via insertion 
(wDVI) are always light larger than those of without double 
via insertion (woDVI). The ratio (wDVI / woDVI) in the table 
is defined as the power (delay) with vs. without considering 
DVI. In summary, the power and delay with DVI averagely 
increase 0.084% and 0.095%, respectively. The CPU time of 
PMXF is always larger than those of DVI-X and their CPU 
times are proportional to the number of sinks or inserted 
double vias. 

Table III shows the comparison of L-type vs single voltage 
island with considering double via insertion. From the table, 
the power, delay, and CPU time using L-type can averagely 
save 21.85% (i.e., 1-0.7815), 6.08% (i.e., 1-0.9392), and 
19.19% (i.e., 1-0.8181), respectively, while the number of 
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vias is paying more 0.87% (i.e., 1.0087-1). 

Table IV shows the results of T-typel. Due to each 
benchmark is divided into three voltage islands and has 
different combined connections for the selection of different 
power and delay. The table present the three cases vs its single 
voltage island of a benchmark, lower power first, smallest 
delay first, and balanced power&delay first. From the table, 
we find that one of their three cases is not always best for all 
the benchmarks. Thus, the determined rule for the best one is 
that power consumption can be minimized as possible under 
the delay control. Therefore, each benchmark has the best one 
in power and delay marked in bold characters in the table. 
Similar results for another T-type2 are shown in Table V. 

All the best results selected from Tables IV and V are listed 
in Tables VI and VII, respectively. We combine these results 
of Tables VI and VII and compare them to the results using 
single voltage island. The power, delay, and CPU time can 
averagely save 21.3% (i.e., [(1-0.7939)+( 1-0.7802)] / 2), 
3.43% (i.e., [(1-1.001 l)+(l-0.9304)] / 2), and 48.4% (i.e., 
[(1-0.4146)+(1-0.6175)] / 2), respectively, while the number 
of vias is paying more 0.22% (i.e., [(1.0002 -1)+(1.0042 -1)] 
/ 2 ). 

Moreover, we compare all the results, the power, delay, and 
CPU time of multi-voltage islands (L-type+T-type) vs single 
voltage island that can save up to 21.58% (i.e., 

(21.85%+21.3%) / 2), 4.75% (i.e., (6.08%+3.43%) / 2), and 
33.8% (i.e., (19.19%+48.4%) / 2) on average, respectively, 
while the number of vias is paying more 0.55% (i.e., 
(0.87%+0.22%) / 2). 

Fig. 16 presents the X-clock trees of (a) L-type, (b) T-typel, 
and (c) T-type2 voltage islands with double via insertion for 
benchmark r5. 
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Fig. 16. X-clock trees of the benchmark r5 that are based on 
(a) L-type, (b) T-typel, and (c) T-type2 voltage islands. 

VI. Conclusion 

The X-clock tree construction with considering one of 
DFM issues, double via insertion (DVI), for via-effect 
avoidances and the DVI rate is always over 90% for reliable 
manufacturing. The modes of multi-voltage islands like 
L-type or T-type are implemented in the DVI-based X-clock 
tree construction in advance, the power consumption can be 
efficiently reduced as well as clock delay and running time 
compared with single one. Expanded work is to consider 
different partition models depending on different voltages and 
clock sink distribution on a 3D chip [17]-[19,[23] and 
integrate them to be a well-defined clock tree under control in 
power and delay. 
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Table II. Results of single voltage island X-clock tree with/without considering double via insertion in via, power delay, and 

CPU time. 


Benchmark 

#Sinks 

Vias in Single island 

Power (mW) in Single island 

Delay (ns) in Single island 

CPU time (s) in Single island 

#via 

#Dvia 

ratio 

woDVI 

wDVI 

ratio 

woDVI 

wDVI 

ratio 

PMXF 

DVI-X 

rl 

267 

1222 

1113 

0.9108 

80.12 

80.237 

1.00146 

278.176 

278.317 

1.00051 

35.71 

0.374 

r2 

598 

2840 

2583 

0.9095 

194.889 

195.091 

1.00104 

858.143 

858.636 

1.00057 

163.9 

1.357 

r3 

862 

3995 

3676 

0.9202 

261.838 

261.996 

1.00060 

1452.224 

1453.014 

1.00054 

29.68 

2.574 

r4 

1903 

9166 

8401 

0.9165 

612.497 

612.901 

1.00066 

4099.221 

4101.838 

1.00064 

270.753 

11.014 

r5 

3101 

14665 

13446 

0.9169 

1012.692 

1013.366 

1.00067 

6934.2 

6938.616 

1.00064 

974.752 

28.314 

Primary 1 

269 

1195 

1122 

0.9389 

169.068 

169.102 

1.00020 

38.369 

38.412 

1.00112 

2.92 

0.375 

Primary 2 

603 

2335 

2159 

0.9246 

417.055 

417.13 

1.00018 

220.477 

220.591 

1.00052 

16.98 

1.42 

sl423 

74 

350 

323 

0.9229 

6.907 

6.914 

1.00101 

6.507 

6.518 

1.00169 

0.11 

0.032 

s5378 

179 

771 

698 

0.9053 

17.549 

17.569 

1.00114 

14.437 

14.459 

1.00152 

4.53 

0.203 

si5850 

597 

2704 

2475 

0.9153 

65.062 

65.155 

1.00143 

43.441 

43.517 

1.00175 

9.77 

1.282 

Average 




0.9181 



1.00084 



1.00095 




Table III. Comparison of L-type vs single voltage-island X-clock trees with considering double via insertion in power, delay, 

and CPU time. 


Benchmark 

Total via 

Power (mW) 

Delay (ns) 

CPU time (s) 

Single 

island 

L-type 

island 

ratio 

Single 

island 

L-type 

island 

ratio 

Single island 

L-type 

island 

ratio 

Single 

island 

L-type 

island 

ratio 

rl 

2335 

2362 

1.0116 

80.237 

59.901 

0.7466 

278.317 

284.837 

1.0234 

16.056 

3.705 

0.2308 

r2 

5423 

5453 

1.0055 

195.091 

147.617 

0.7567 

858.636 

853.203 

0.9937 

87.101 

15.211 

0.1746 

r3 

7671 

7919 

1.0323 

261.996 

211.568 

0.8075 

1453.014 

1344.603 

0.9254 

16.347 

18.034 

1.1032 

r\ 

17567 

17843 

1.0157 

612.901 

491.303 

0.8016 

4101.838 

3011.99 

0.7343 

194.833 

132.573 

0.6804 

r5 

28111 

26581 

0.9456 

1013.366 

813.438 

0.8027 

6938.616 

6377.82 

0.9192 

523.916 

654.573 

1.2494 

Primary 1 

2317 

2243 

0.9681 

169.102 

129.026 

0.7630 

38.412 

44.539 

1.1595 

2.489 

2.492 

1.0012 

Primary2 

4494 

4373 

0.9731 

417.13 

322.762 

0.7738 

220.591 

150.233 

0.6812 

11.109 

7.593 

0.6835 

sl423 

673 

705 

1.0476 

6.914 

5.312 

0.7683 

6.518 

6.147 

0.9431 

0.142 

0.152 

1.0704 

s5378 

1469 

1608 

1.0946 

17.569 

15.008 

0.8542 

14.459 

13.571 

0.9386 

4.733 

6.333 

1.3381 

si5850 

5179 

5145 

0.9934 

65.155 

48.266 

0.7408 

43.517 

46.732 

1.0739 

11.052 

7.18 

0.6497 

Average 



1.0087 



0.7815 



0.9392 



0.8181 


Table IV. Comparison of T-typel vs single voltage-island X-clock trees with considering double via insertion in power and 

delay 


Benchmark 

Single island 

Lower power first for T-typel 

Smallest delay first for T-typel 

Balanced power&delay for T-typel 


P(mW) 

Delay(ns) 

P(mW) 

ratio 

Delay(ns) 

ratio 

P(mW) 

ratio 

Delay(ns) 

ratio 

P(mW) 

ratio 

Delay(ns) 

ratio 

rl 

80.237 

278.317 

59.691 

0.7439 

427.582 

1.5363 

62.207 

0.7753 

288.858 

1.0379 

62.116 

0.7742 

299.385 

1.0757 

r2 

195.091 

858.636 

148.002 

0.7586 

1250.522 

1.4564 

150.909 

0.7735 

921.493 

1.0732 

150.005 

0.7689 

1078.489 

1.256 

r3 

261.996 

1453.014 

211.714 

0.8081 

1591.417 

1.0953 

217.006 

0.8283 

1336.342 

0.9197 

212.448 

0.8109 

1595.594 

1.0981 

r4 

612.901 

4101.838 

488.249 

0.7966 

4708.278 

1.1478 

500.119 

0.816 

3539.556 

0.8629 

495.312 

0.8081 

3972.278 

0.9684 

r5 

1013.37 

6938.616 

782.342 

0.772 

7953.024 

1.1462 

808.245 

0.7976 

7091.42 

1.022 

790.549 

0.7801 

7668.044 

1.1051 

primaryl 

169.102 

38.412 

132.392 

0.7829 

70.796 

1.8431 

135.808 

0.8031 

42.198 

1.0986 

135.332 

0.8003 

47.789 

1.2441 

primary2 

417.13 

220.591 

319.622 

0.7662 

220.743 

1.0007 

333.061 

0.7985 

167.825 

0.7608 

329.334 

0.7895 

175.559 

0.7959 

sl423 

6.914 

6.518 

5.179 

0.7491 

9.342 

1.4333 

5.44 

0.7868 

6.98 

1.0709 

5.44 

0.7868 

7.038 

1.0798 

s5378 

17.569 

14.459 

13.535 

0.7704 

20.23 

1.3991 

14.112 

0.8032 

15.068 

1.0421 

13.992 

0.7964 

16.882 

1.1676 

si5850 

65.155 

43.517 

48.224 

0.7401 

61.844 

1.4211 

50.532 

0.7756 

42.468 

0.9759 

50.397 

0.7735 

42.714 

0.9815 

Average 




0.7688 


1.3479 


0.7958 


0.9864 


0.7889 


1.0772 


Table V. Comparison of T-type2 vs single voltage-island X-clock trees with considering double via insertion in power and delay 


Benchmark 

Single island 

Lower power first for T-type2 

Smallest delay first for T-type2 

Balanced power&delay for T-type2 


P(mW) 

Delay(ns) 

P(mW) 

ratio 

Delay(ns) 

ratio 

P(mW) 

ratio 

Delay(ns) 

ratio 

P(mW) 

ratio 

Delay(ns) 

ratio 

rl 

80.237 

278.317 

59.851 

0.7459 

300.465 

1.0796 

65.776 

0.8198 

285.819 

1.027 

65.809 

0.8202 

285.82 

1.027 

r2 

195.091 

858.636 

140.134 

0.7183 

823.275 

0.9588 

150.371 

0.7708 

630.627 

0.7345 

149.624 

0.7669 

631.169 

0.7351 

r3 

261.996 

1453.014 

205.05 

0.7826 

1378.38 

0.9486 

216.096 

0.8248 

1160.669 

0.7988 

211.387 

0.8068 

1196.014 

0.8231 

r4 

612.901 

4101.838 

479.296 

0.782 

3933.765 

0.959 

505.203 

0.8243 

3550.251 

0.8655 

494.468 

0.8068 

3624.769 

0.8837 

r5 

1013.37 

6938.616 

768.672 

0.7585 

6424.236 

0.9259 

807.773 

0.7971 

5538.822 

0.7983 

790.973 

0.7805 

5612.758 

0.8089 

primaryl 

169.102 

38.412 

124.731 

0.7376 

57.195 

1.489 

131.744 

0.7791 

42.847 

1.1155 

130.898 

0.7741 

44.059 

1.147 
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primary2 

417.13 

220.591 

308.293 

0.7391 

209.02 

0.9475 

327.941 

0.7862 

167.037 

0.7572 

324.382 

0.7777 

167.721 

0.7603 

sl423 

6.914 

6.518 

5.31 

0.768 

8.317 

1.276 

5.815 

0.841 

5.799 

0.8897 

5.761 

0.8332 

5.871 

0.9007 

s5378 

17.569 

14.459 

12.839 

0.7308 

15.598 

1.0788 

14.038 

0.799 

11.204 

0.7749 

13.887 

0.7904 

11.265 

0.7791 

si5850 

65.155 

43.517 

48.416 

0.7431 

46.194 

1.0615 

52.137 

0.8002 

32.254 

0.7412 

52.051 

0.7989 

32.26 

0.7413 

Average 




0.7506 


1.0725 


0.8042 


0.8502 


0.7956 


0.8606 


Table VI. Comparison of T-typel vs single voltage-island X-clock trees with considering double via insertion in power, delay, 

and CPU time. 


Benchmark 

Total via 

Power (mW) 

Delay (ns) 

CPU time (s) 

Single 

island 

T-typel 

island 

ratio 

Single 

island 

T-typel 

island 

ratio 

Single island 

T-typel 

island 

ratio 

Single 

island 

T-typel 

island 

ratio 

rl 

2335 

2355 

1.0086 

80.237 

62.207 

0.7753 

278.317 

288.858 

1.0379 

16.056 

2.664 

0.1659 

r2 

5423 

5398 

0.9954 

195.091 

150.909 

0.7735 

858.636 

921.493 

1.0732 

87.101 

6.198 

0.0712 

r3 

7671 

7937 

1.0347 

261.996 

217.006 

0.8283 

1453.014 

1336.342 

0.9197 

16.347 

6.838 

0.4183 

r4 

17567 

18322 

1.0430 

612.901 

495.312 

0.8081 

4101.838 

3972.278 

0.9684 

194.833 

48.904 

0.2510 

r5 

28111 

27530 

0.9793 

1013.366 

808.245 

0.7976 

6938.616 

7091.42 

1.0220 

523.916 

235.56 

0.4496 

Primary 1 

2317 

2214 

0.9556 

169.102 

135.808 

0.8031 

38.412 

42.198 

1.0986 

2.489 

1.723 

0.6923 

Primary2 

4494 

4285 

0.9535 

417.13 

329.334 

0.7895 

220.591 

175.559 

0.7959 

11.109 

5.226 

0.4704 

sl423 

673 

662 

0.9837 

6.914 

5.44 

0.7868 

6.518 

6.98 

1.0709 

0.142 

0.18 

1.2676 

s5378 

1469 

1560 

1.0620 

17.569 

14.112 

0.8032 

14.459 

15.068 

1.0421 

4.733 

0.204 

0.0431 

si5850 

5179 

5110 

0.9867 

65.155 

50.397 

0.7735 

43.517 

42.714 

0.9816 

11.052 

3.5 

0.3167 

Average 



1.0002 



0.7939 



1.0011 



0.4146 


Table VII. Comparison of T-type2 vs single voltage-island X-clock trees with considering double via insertion in power, delay, 

and CPU time. 


Benchmark 

Total via 

Power (mW) 

Delay (ns) 

CPU time (s) 

Single 

island 

T-type2 

island 

ratio 

Single 

island 

T-type2 

island 

ratio 

Single island 

T-type2 

island 

ratio 

Single 

island 

T-type2 

island 

ratio 

rl 

2335 

2384 

1.0210 

80.237 

65.776 

0.8198 

278.317 

285.819 

1.0270 

16.056 

4.818 

0.3001 

r2 

5423 

5533 

1.0203 

195.091 

140.134 

0.7183 

858.636 

823.275 

0.9588 

87.101 

3.558 

0.0409 

r3 

7671 

7685 

1.0018 

261.996 

205.05 

0.7827 

1453.014 

1378.38 

0.9486 

16.347 

8.444 

0.5166 

r\ 

17567 

17486 

0.9954 

612.901 

479.296 

0.7820 

4101.838 

3933.765 

0.9590 

194.833 

68.26 

0.3504 

r5 

28111 

28162 

1.0018 

1013.366 

768.672 

0.7585 

6938.616 

6424.236 

0.9259 

523.916 

263.118 

0.5022 

Primary 1 

2317 

2288 

0.9875 

169.102 

131.744 

0.7791 

38.412 

42.847 

1.1155 

2.489 

2.676 

1.0751 

Primary2 

4494 

4597 

1.0229 

417.13 

308.293 

0.7391 

220.591 

209.02 

0.9476 

11.109 

7.25 

0.6526 

sl423 

673 

651 

0.9673 

6.914 

5.761 

0.8332 

6.518 

5.871 

0.9007 

0.142 

0.296 

2.0845 

s5378 

1469 

1519 

1.0340 

17.569 

13.887 

0.7904 

14.459 

11.265 

0.7791 

4.733 

0.182 

0.0385 

si5850 

5179 

5129 

0.9904 

65.155 

52.051 

0.7989 

43.517 

32.26 

0.7413 

11.052 

6.789 

0.6143 

Average 



1.0042 



0.7802 



0.9304 



0.6175 
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